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Abstract 

The Bethe-Salpeter eigenvalue problem is a dense structured eigenvalue problem arising 
from discretized Bethe-Salpeter equation in the context of computing exciton energies and 
states. A computational challenge is that at least half of the eigenvalues and the associated 
eigenvectors are desired in practice. We establish the equivalence between Bethe-Salpeter 
eigenvalue problems and real Hamiltonian eigenvalue problems. Based on theoretical analysis, 
structure preserving algorithms for a class of Bethe-Salpeter eigenvalue problems are proposed. 
We also show that for this class of problems all eigenvalues obtained from the Tamm-Dancoff 
approximation are overestimated. In order to solve large scale problems of practical interest, 
we discuss parallel implementations of our algorithms targeting distributed memory systems. 
Several numerical examples are presented to demonstrate the efficiency and accuracy of our 
algorithms. 

Keywords: Bethe-Salpeter equation, Tamm-Dankoff approximation, Hamiltonian eigen¬ 
value problems, structure preserving algorithms, parallel algorithms 


1 Introduction 

The absorption of a photon by a molecular system or solid can promote an electron in an occupied 
single-particle state (or orbital) to an unoccupied state while keeping the charge neutrality. In 
the physics community, this process is often described as the simultaneous creation of a negatively 
charged quasielectron (or simply electron) and a positively charged quasihole (or hole) in the ma¬ 
terial that was originally in the lowest energy electronic configuration (the ground state). Upon 
absorbing a photon, the entire molecular or extended system is in an excited state that contains a 
correlated electron-hole pair, which is referred to as an exciton. The amount of energy required to 
trigger this excitation gives an important characterization of the material. In many-body physics, 
a two-particle collective excitation is often described by a two-particle Green’s function, with the 
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excitation energy level being a pole of this function. It has been shown that the two-particle Green’s 
function satisfies an equation often known as the Bethe-Salpeter equation (BSE) |28j . 

The poles of the two-particle Green’s function can be obtained by computing the eigenvalues 
of a Hamiltonian operator % associated with this Green’s function. It can be shown that, with 
an appropriate discretization scheme, the finite dimensional representation of the Bethe-Salpeter 
Hamiltonian has the following block structure 


H = 


■ A 
-B 


B 

-A ’ 


( 1 ) 


where A,B G C"^”, with 

A = A^, B = B'^. ( 2 ) 

Here we use A^ to denote the conjugate transpose of A and B^ to denote the transpose of B. 
We will refer to an eigenvalue problem of the form Q with the additional symmetry given by 
Equation ([^ as a Bethe-Salpeter eigenvalue problem. 

In principle, we are interested in all possible excitation energies, although some excitations are 
more likely to occur than others. Such likelihood can often be measured in term of what is known 
as the spectral density or density of states of i/, which is defined to be the number of eigenvalues 
per unit energy interval [T5], that is. 




(3) 


where Xj’s denote the eigenvalues of H. This formulation requires all eigenvalues of H to be real, 
which is the case for most physical systems. In addition, the optical absorption spectrum 


■(w) 


E 

i=i 


idrXj)iy'^di) 






(4) 


which can be measured in optical absorption experiments, is also of practical interest. Here dr and 
di are dipole vectors, Xj and yj are right and left eigenvectors, respectively, corresponding to the 
positive eigenvalue Xj. To obtain highly accurate representations of and Q, the computation 
of all eigenpairs is required. In addition to the optical absorption spectrum defined in 0, the 
individual pairs of left and right eigenvectors are often desired, since they describe the character of 
each two-particle excited state. 

In general, both A and B are dense. The dimension of these matrices is proportional to the 
product of the number of occupied and unoccupied states, both of which are proportional to the 
number of electrons in the system. Hence it can become quite large for large systems that contain 
many atoms (and electrons). 

We are interested in efficient and reliable parallel algorithms for computing all eigenvalues and 
the corresponding eigenvectors of Q. Because iJ is a non-Hermitian matrix, we need to compute 
both the left and the right eigenvectors. 

Although it is possible to treat 77 as a general non-Hermitian matrix and use existing parallel 
algorithms [HIE] implemented in ScaLAPAGK [5] to solve such a problem, this approach does 
not take advantage of the special structure of the Bethe-Salpeter Hamiltonian and is thus not 
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efficient. Nor does this approach preserve some desirable properties of the eigenvalues and eigen¬ 
vectors. Moreover, the current release of ScaLAPACK only has a subroutine that performs a Schur 
decomposition of a complex non-Hermitian matrix [12] . 

In the following section, we show that H belongs to a class of matrices known as J-symmetric 
matrices whose eigenvalues satisfy a special symmetry property. Although several algorithms have 
been developed for computing the eigenvalues and eigenvectors of this class of matrices, efficient 
parallel implementations of these algorithms are not available, and are not easy to develop. 

In this paper, we develop a special algorithm that can leverage existing parallel computational 
kernels available in the ScaLAPACK software package. Our algorithm is based on the observation 
that computing the eigenpairs of 0 is equivalent to computing the eigenpairs of a real Hamiltonian 
matrix of the form 


, _ rim(A -h B) -Re(A - B) 
~ [Re(A -h B) Im(A - B) 


(5) 


where Re(A) and Im(A) denote the real and imaginary parts of A, respectively. Furthermore, when 
A and B satisfy the property _ 


A 

B 



( 6 ) 


which often holds in practice [3S] , it can be shown that all eigenvalues of \Hr and thus of H are real, 
and they come in positive and negative pairs [mdii. We present an efficient parallel algorithm for 
computing the positive eigenvalues and the corresponding right eigenvectors. The algorithm makes 
use of existing kernel in ScaLAPACK as well as a new kernel we developed for computing eigenpairs 
of a skew symmetric tridiagonal matrix. A simple transformation can be used to obtain the right 
eigenvectors associated with the negative eigenvalues, as well as all left eigenvectors. 

When H is real, which is the case for systems with real-space inversion symmetry, the Bethe- 
Salpeter eigenvalue problem can be transformed into a product eigenvalue problem. We propose 
an efficient and accurate parallel algorithm for solving the product eigenvalue problem. 

When facing the challenge of computing the eigenpairs of the non-Hermitian matrix Q, many 
researchers in the physics community choose to drop the off-diagonal blocks, B and —B, and 
compute eigenpairs of the Hermitian matrix A only. This approach is often known as the Tamm- 
Dancoff approximation (TDA) [TOl [23 ES]. We show that when the condition § holds, each 
eigenvalue of A is an upper bound of the corresponding positive eigenvalue of H when all eigenvalues 
of A and H are sorted. Our numerical experiment shows that TDA can introduce a non-negligible 
shift of the spectral density of H, which is consistent with what has been reported in the physics 
literature [BiiaES]. 

The rest of the paper is organized as follows. In Sectionj^ we discuss some basic properties of the 
Bethe-Salpeter eigenvalue problem. Then in Section we develop structure preserving algorithms 
built on these properties as well as the additional assumption Q. Finally, we demonstrate the 
efficiency and accuracy of our proposed algorithms in Section by several examples from physical 
models. 


2 Properties of Bethe—Salpeter eigenvalue problems 

In this section, we examine the special properties of the Bethe-Salpeter Hamiltonian that allow us 
to develop efficient algorithms for computing its eigenpairs. 

Y means that Jf — Y is Hermitian positive definite. 
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2.1 Relation to Hamiltonian eigenvalue problems 


We first show that H belongs to a class of matrices known as J-symmetric matrices. Let J„ be the 
2n X 2n skew-symmetric matrix 


Jn 


0 In 
-In 0 


(7) 


A matrix X G is called a J-symmetric matrix if {XJn)~^ = XJn [2T1|22]. When X is real, 

it is also called a Hamiltonian matrix. By definition, X is J-symmetric if and only if X admits the 
block structure 

y _ Xii Xi2 

^ - [A21 -XJ,_ 

with Ai 2 and A 21 complex symmetric. It can also be verified that this J-symmetric structure is 
preserved under what is called a complex symplectic similarity transformation 


H = S-^HS, 


where S G jg complex symplectic matrix that satisfies S'^J„5' = Jn [22]- These properties 

are key to the development of several numerical algorithms for computing the eigenvalues of dense 
J-symmetric matrices. Examples of these algorithms include the Hamiltonian-Jacobi algorithms [21 
[M] and SR-like algorithms [T^ . 

The Bethe-Salpeter Hamiltonian matrix defined in Q is clearly a J-symmetric matrix because 



■ A 

B ' 


'A B ' 

H = 

-B 

-A 

— 

-B 


Although the algorithms for J-symmetric matrices can be used to solve the Bethe-Salpeter eigen¬ 
value problems, they do not take advantage of the additional symmetry relationship between the 
(1,2) and (2,1) blocks in H. This symmetry leads to the symmetry of A(iJ) as stated in the 
following theorem. 

Theorem 1 ([!]). Let H be of the form 0 with A Hermitian and B symmetric. If A is an 
eigenvalue of H, then —A, A, —A are also eigenvalues of H with the same multiplicity. 

Unfortunately, complex symplectic transformations in general do not preserve the structure 
of Bethe-Salpeter Hamiltonian matrices. To seek a class of fully structure preserving similarity 
transformations, we show in the following theorem that solving a Bethe-Salpeter eigenvalue problem 
is equivalent to solving a real Hamiltonian eigenvalue problem. This is the main theoretical result 
of this paper. 


Theorem 2. A Bethe-Salpeter eigenvalue problem can be reduced to a real Hamiltonian eigenvalue 
problem, and vice versa. 


Proof. Let 


Then 


Q 


V2 


^In 

i7n 



B 

-A 


Q = i 


Im(A -I- B) 
Re(A -k B) 


-Re(A - B) 
Im(A — B) 


-\JnM, 
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where 


M = 


Re(A + B) 
—Im(A + B) 


Im(A — B) 
Re(A - B) 


( 8 ) 


is a real symmetric matrix. Therefore, any Bethe-Salpeter eigenvalue problem can be converted to 
a real Hamiltonian eigenvalue problem. 

On the other hand, let be a real 2n x 2n Hamiltonian matrix of the form 


= 


H 


11 


H 


12 


H21 -H 


11 


where HJ 2 = H 12 , Hji — i? 2 i- We set 


. _ H12-H21 , 

2 2 ’ 

It can be verified that = A, BA = B, and 


„ H12 + H21 

JD = - 1 - 


H, 


QHrQ^ = i 


' A 
-B 


B 

-A 


Therefore, we can convert a real Hamiltonian eigenvalue problem to a Bethe-Salpeter eigenvalue 
problem. This completes the proof. □ 

Theorem fully characterizes the spectral properties of general Bethe-Salpeter eigenvalue prob¬ 
lems. Theorem is a direct consequence of Theorem As a result, several existing algorithms 
(see 0 ) for solving Hamiltonian eigenvalue problems can be applied to the matrix Hr defined in ([^. 

The primary interest of this paper is drawn to the case when the property § holds. In this 
case, the Bethe-Salpeter eigenvalue problem is essentially a symmetric eigenvalue problem because 


\ln 0 1 


'A B 

1 

1 

0 
_1 


B A 


(9) 


is the product of two Hermitian matrices, and in addition, one of them is positive definite |16j . 
Therefore, H is diagonalizable and has real spectrum. In addition, the eigenvectors of H also admit 
special structures. These properties are summarized in the following theorem. 

Theorem 3. Let H be of the form Q satisfying ([^ and Q. Then there exist Xi, X 2 G C"^” 
and positive numbers Ai A 2 , ..., A„ € K such that 


where 


HX = X 




Y^H = 






Ai 

^ 2 ' 

, Y = 

■ Ai 

-A2' 

A 2 

^ 1 . 


-A 2 

. 


and A+ = diag {Ai,..., A„}. 


= /2„, 
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Proof. Let 


A B 
B A ■ 

It follows from 0 that the eigenvalue problem Hx = Ax is equivalent to Sx = A which is a 
generalized Hermitian-definite eigenvalue problem. It is well known (see, for example, m Section 
8.7]) that S and If are simultaneously diagonalizable by congruence transformationsand hence H 
is diagonalizable and has real eigenvalues. According to Theorem there exist positive numbers 
Ai A 2 , ..An such that H is similar to diag {A_|_, —A+}, where A+ = diag {Ai,..., A„}. 

To determine the structure of the eigenvectors of H, we start with any nonsingular matrix 



C = 


Cn 

C '21 


C 12 

^22 


^ ^2nx2n 


satisfying = l 2 n and C^SC = diag{A+,—A+} Then C ^HC = diag{A+, —A+}. By 

setting Xi = CuA^^ and A 2 = C' 2 iAy^, we obtain that 


and 


Ai' 


Ai' 

.^2_ 


.^ 2 , 


A"Ai - X^X2 = In, 


It is straightforward to verify that the following equations are equivalent to (10): 


■A2' 


a; 

.^1. 

1 

1 

.^1. 


A 


+ ) 


[X\',-X^]H = A+[X\',-X^], 
[-A2^Xl]i^ = -A+[-A2^Al]. 


( 10 ) 

( 11 ) 

( 12 ) 

(13) 

(14) 


Thus we have obtained all right and left eigenvectors of H. Finally, it follows from (101 and ( |14| 
that 

Ai 
A 2 


(Ai A 2 - A2Vi)A+ = [-A2^Al]i^ 


= -A+(AiA 2 -A2A1). 


Since A_|_ and —A_|_ have no common eigenvalue, the homogeneous Sylvester equation XA_|_ = —A.^_Z 
has a unique solution Z = 0. Hence X^ X 2 — X 2 Xi = 0. Combining this result with 0 , we 
conclude that Y^X = / 2 „. This completes the proof. □ 


to 


We now consider a relatively simple case in which is a real matrix. In this case 


([^ simplihes 


H = 


■ A 
-B 


B 

-A ’ 


(15) 


where both A and B are n x n real symmetric matrices. By definition is a real Hamiltonian 
matrix. Performing a symplectic orthogonal similarity transformation yields a block cyclic form 


1 


Ah.y 



0 A + B 

71 


^/2 



A-B 0 


^By congruence transformation, we mean a linear map on of the form X 1—> C^XC where C S jg 

nonsingular. 
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This suggests that the eigenvalues of H are the square roots of the eigenvalues of {A + B){A — B). 
Since both A and B are real, Q simplifies to that 


'A 

B 


B 

A 


^ 0 , 


or equivalently, both A + B and A — B are positive definite. Under this condition, the real Bethe- 
Salpeter eigenvalue problem is also known as a linear response eigenvalue problem which recently has 
attracted a lot of attention (see, for example, [21 [3]). In contrast to many recent developments [^126] 
in linear response eigenvalue problems that focus on large sparse eigensolvers, we develop dense 
eigensolvers for this eigenvalue problem in the next section. 


2.2 Tamm—Dancoff approximation 

When the off-diagonal blocks of the Bethe-Salpeter Hamiltonian are set to zero, known in the 
physics community as the Tamm-Dancoff approximation nniiiaiin], the Bethe-Salpeter eigenvalue 
problem reduces to a Hermitian eigenvalue problem. One can use efficient algorithms available in 
ScaLAPACK to compute eigenpairs of A. In many cases, the results are found to be sufficiently 
close to the eigenvalues of the full Bethe-Salpeter Hamiltonian and explain experiment. However, 
in general, this simplification can lead to noticeable difference in the computed spectrum. 

In this subsection, we show that Tamm-Dancoff approximation consistently overestimate the 
positive eigenvalues when the property ([^ holds. More precisely, we have 

X,iH) < A, (A) 


for j = 1, 2, ..., n, where Xj{-) denote the jth largest eigenvalue. That is, every positive eigenvalue 
obtained by TDA is greater than or equal to the corresponding one of H. This theoretical result is 
consistent with several computational experiments reported in mils]- However, we have not been 
able to find a rigorous proof of such a result in physics literature. To the best of our knowledge, 
this result is not well known in the numerical linear algebra community, and its proof is not entirely 
trivial. 

We provide a proof of this important property which we state clearly in Theorem Our proof 
makes use of the following lemma which appeared relatively recently in [B] . 

Lemma 1 (|B]). Let Ai, A 2 G C"^” be Hermitian positive definite. Then 


forj = 1,2, ..., n. 

With the help of this arithmetic-geometric inequality on eigenvalues, we prove the claim we 
have made in the following theorem. 

Theorem 4. Let H be as defined in Q . Then under the conditions and (§ , we have 

\,{H) < A, (A) 


for j = 1,2, ..., n. 
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Proof. Let 


■ Re(A) 

Im(A) 

B = 

■ Re(B) 

—Im(B) 

—Im(A) 

Re(A) 

—Im(B) 

-Re(B) 


Then 


1 


H 

'A B 

1 


71 



B A 

’ 71 



indicating that is equivalent to ^ + i? 0. Notice that 

A + B = jJ{A-B)Jr,. 


( 16 ) 


Therefore, ^ — B is also poshive _deHnite. In the proof of Theorem we have shown that H 
is unitarily similar to —iJn{A + B). Since the eigenvalues of H are real and appear in pairs 
{Xj{H),—Xj{H)}, we obtain 


0 < X,(H) = X,(-iJ„(A + B)) = ^A2, ([-iJ„(A + B)]2) = ^A2, ((Al-B)(A + B)). 
Applying Lemma yields 

^X2j{{A-B){A + B)) < X2,{A). 

Finally, since A(A) = A(A) U A(A) = A(A) U A(A), that is, the eigenvalues of A are the same as 
those of A with doubled multiplicity, we arrive at 

X 22 {A) = Xj (A). 


The theorem is thus proved. 


□ 


3 Algorithms and implementations 

In this section, we present structure preserving algorithms for solving the Bethe-Salpeter eigenvalue 
problem. As we have shown in Theorem Bethe-Salpeter eigenvalue problems are equivalent to 
real Hamiltonian eigenvalue problems. Thus any Hamiltonian eigensolver (see, for example, 0 ) 
can always be used to solve this type of eigenvalue problem. However, when (|^ holds, a more 
efficient algorithm can be used to solve the Bethe-Salpeter eigenvalue problem. Throughout this 
section the condition ^ is assumed to be satished. In most cases, only the positive eigenvalues 
and the corresponding eigenvectors are required for studying the properties of materials. We will 
demonstrate how to compute them efficiently. 


3.1 Complex Bethe-Salpeter eigenvalue problems 

From (|^, we can write 


Hx = Xx 


■/n 

0 ■ 


1 

'A 

B 

0 

-In 

X = 

A 

B 

A 


A straightforward approach is to feed this problem to a generalized Hermitian-definite eigensolver 
(for example, ZHEGV in LAPACK [T]). However, this approach is in general not structure preserving. 



















meaning that the computed eigenvalues and eigenvectors may not have the properties described in 
Theorem To see this, we analyze the algorithm implemented in LAPACK’s ZHEGV, which first 
computes the Cholesky factorization 


'A 

B 


B 

A 


= LL^ 


and then applies a standard Hermitian eigensolver to the transformed problem 


L 


In 

0 



L». 


Let L + AL be the computed Cholesky factor. Then the backward error of H is 


All —A21 

A21 A22 


In 

0 



(^{L + AL){L + ALf 


'A 

B aJ j’ 


which does not necessarily satisfy Ajj^ = A 21 and A 22 = An. Therefore, the structure of H is 
destroyed in the sense that the error introduced in the computed Cholesky factorization cannot be 
interpreted as a structured backward error of H. Consequently, the properties given in Theorem 
are lost. For instance, the eigenvalues of {L + AL) diag {/„, —/„} (L + AL) do not necessarily appear 
in positive and negative pairs. 

To develop a structure preserving approach, we make use of the observation we made in Sectionj^ 
We have observed that Q^HQ = —UnM, where 


Q 


1 

71 


^In 

iln ’ 


M = 


' Re(A + B) 
—Im(A + B) 


Im(A — B) 
Re(A - B) 


Notice that both J„ and M are real matrices. Thus by working with these matrices, we can avoid 
the use of complex arithmetic. The basic steps of our algorithm are outlined in Algorithm in 
which we only compute the positive eigenvalues and the corresponding eigenvectors. Once the 
matrix M has been constructed, we perform a Cholesky factorization, M = LLA to transform the 
non-Hermitian matrix, into the Hermitian matrix, —iLAJ^L. Let AL be the error in the 

computed Cholesky factor L. Then 


All A 21 
A21 A22 


{L + AL){L + AL)'^ - M 


is the backward error of M, and is real and symmetric. Similar to the proof of Theorem we set 


AA = 


All + A 


22 


.A^ 


21 


~ A 21 


AB = 


All — A22 


. A 21 + Aj] 


21 


Then we have (AA)'^ = AA, (AH)^ = AB, and 


All Aj^ 


■ Re(AA + AB) Im(AA - AB) 

_A21 A22_ 


-Im(AA + AB) Re(AA - AB) 


indicating that the backward error in the computed Cholesky factorization of M can be converted 
to a structured backward error of H. We remark that this analysis is valid even if AL is not lower 
triangular. 
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Algorithm 1 Algorithm for the complex Bethe-Salpeter eigenvalue problem 
Input: A = , B = g C"^" such that (|^ is satisfied. 

Output: Xi, X 2 G and A+ = diag {Ai,..., A„} satisfying 


H 


X' 


Ai' 

.^2. 


.^ 2 . 


A, 


+ ) 


Al^Ai - A2”A2 = In, 


and Ai > 0 for i = 1, 
1: Construct 


n. 


M = 


■ Re(A + B) 
—Im(A + B) 


Im(A — B) 
Re(A - B) 


2 

3 

4 

5 


Compute the 
Construct W 
Compute the 
Set 


Cholesky factorization M = LL^. 

= JnL, where Jn is defined in Q. 

spectral decomposition —iW = [Z+,Z-] diag {A+, —A+} [Z+, Z-]^. 


Ai' 


'In 

0 ■ 

.^2_ 


0 

-1 

1 


QLZ+A-^/^. 


In the next step we form lA JnL, which is a real skew-symmetric matrix whose spectrum is 
symmetric with respect to the origin. Applying a skew-symmetric eigensolver (for example, the one 
in [SniEI]) to this matrix preserves the structure of A{H) and avoids the explicit use of complex 
arithmetic. We will discuss the use of a skew-symmetric eigensolver in detail in Section 3.3 Since 
any 2n x 2n real skew-symmetric matrix is of the form C~^JnC, where C g ]j 2 Tix 2 Tt^ skew- 
symmetric backward errors introduced in the construction of and in the skew-symmetric 

eigensolver can be interpreted as a backward error (which does not need to be lower triangular) of 
L. We have shown that the error in L can be converted to a structured backward error of H. Thus 
this approach is structure preserving. As a remark, we mention that there exists an alternative 
approach that computes an SVD-like decomposition of L and avoids the explicit construction of 
LA JnL, see [33l|34j for details. 

Once the eigenvalues and eigenvectors of lA JnL have been computed, the eigenvectors of H can 
be recovered by simply accumulating all similarity transformations. Complex arithmetic can also 
be avoided in this step by carefully manipulating the real and imaginary parts of the eigenvectors. 
As we have shown in Theorem the left eigenvectors, as well as the eigenvectors corresponding to 
negative eigenvalues, can be restored with negligible effort according to (12)-(14) if needed. 


3.2 Real Bethe—Salpeter eigenvalue problems 

As we have seen in Section the real Bethe-Salpeter eigenvalue problem can be reduced to a 
product eigenvalue problem. Directly feeding A-\- B and A — B io a symmetric-definite eigensolver 
squares the eigenvalues and can potentially spoil the accuracy of the eigenvalues and eigenvectors. 
When the accuracy requirement is high, an alternative approach is needed. Suppose that A-\- B = 
LiLJ, A — B — L 2 LJ. Then (A + B){A — B) is similar to (AJLi)(LJLi)^. Notice that the spectral 
decomposition of (LJLi)(LJLi)^ can be obtained from the singular value decomposition of LJLi. 
Theorem [5] summarizes this observation. 
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Algorithm 2 Algorithm for the real Bethe-Salpeter eigenvalue problem 
Input: A = A^,B = B'^€ such that A + B y 0 and A- B y 0. 

Output: Xi, X 2 G and A+ = diag {Ai,..., A„} satisfying 


H 


Al' 


Ai' 

.^2. 


.^2. 


A 




XjXi - XjX2 = In, 


and Ai > 0 for i = 1, ..n. 

1: Compute the Cholesky factorizations A + B = LiLj, A — B = ^ 2 ^ 2 - 
2: Compute the singular value decomposition LJLi = UA+V~''. 

3: Set Ai = L 2 U + LiV, X 2 = L 2 U - LiV. 


Theorem 5 . Let H be defined by ( 151 satisfying that A + ByO, A—B'^ 0 . Suppose that 
LJLi = t 7 A_|_y^ is the singular value decomposition of L^Li, where Li, L2 G satisfy that 

LiLJ = A + B and L2L2 = A — B. Then the spectral decomposition of H is given by 


H = X 


■A+ 

0 


0 

-A+ 


Y' 


where 


and 


1 

'L2U + L1V 

L2U-L1V 


A7^/^ 0 

2 

L2U-L1V 

L2U + L1V 


1 

0 ' 

> 

+ 1 


Y = X-' = - 


T_ 1 

'L1V + L2U L1V-L2U 

> 

+ 1 

0 

2 

L1V-L2U L1V + L2U 

0 


Proof. It suffices to verify that HX = A diag {A+, —A+} and Y^X = fi 
simple algebraic manipulations. 


2 n, which only require 

□ 


Based on this theorem, we propose Algorithm as a real Bethe-Salpeter eigensolver. We 
remark that such an eigensolver can also be used as a dense kernel within structure preserving 
projection methods for linear response eigenvalue problems mm. Just like the complex case, the 
negative eigenvalues and the corresponding eigenvectors, if needed, can be easily constructed from 
the outputs of Algorithm according to Theorems and 


3.3 Parallel implementations 

To solve large scale Bethe-Salpeter eigenvalue problems arising from quantum physics, paralleliza¬ 
tion of Algorithms and on distributed memory systems is required in practical computations. 
All 0{n^) operations in Algorithm consist of basic linear algebra operations, and can be accom¬ 
plished by calling linear algebra libraries such as BLAS/LAPACK. There also exist ScaLAPACK 
subroutines for these operations, which allow us to parallelize the algorithm in a straightforward 
manner. So we will mainly discuss implementation issues for Algorithm]^ 

The main obstacle to efficiently implementing Algorithm is the lack of a skew-symmetric 
eigensolver in LAPACK/ScaLAPACK. The algorithm described in [30l[3T] is based on level 1 BLAS 
operations, hence is not efficient on modern architectures with memory hierarchy. Therefore, we 
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Algorithm 3 Parallel implementation of steps and in Algorithm 
1: Tridiagonal reduction: W = UTU~^. 

2: Compute the positive spectral decomposition —iD^TD = P_] diag {A+, — A+j P_]^, 

where D — diag {l, i, i^i • ■ ■) i^"}- 

3: Construct = LU by applying a sequence of Householder reflections. 

4: Construct Y = Q^D. 

5: Construct F+ = HV+ using PBLAS. 

6: Set 

In ^ V 


X2 


have to implement step in Algorithm [l] by ourselves. To make use of ScaLAPACK as much 
as possible, we propose the following strategy shown in Algorithm Several remarks on the 
implementation are in order: 


1. Tridiagonal reduction of a skew-symmetric matrix can be accomplished by applying a se¬ 
quence of Householder reflections. This is in fact slightly simpler compared to reducing a 
symmetric matrix to tridiagonal form. We implemented a modified version of ScaLAPACK’s 
PDSYTRD to achieve this goal. Several BLAS/PBLAS-like subroutines for skew-symmetric 
matrix operations are also implemented. 


2. Suppose that 

{ ai 

0 

-ai 

Then 

{ Oil 

0 

ai 


Ol2n-l 


0^ 


-<^2n-l 


<X2n-l 


<X2n-l 


is a real symmetric tridiagonal matrix, whose spectral decomposition can be easily computed 
by calling ScaLAPACK. This technique is essentially the same as the one in m- We then use 
the bisection method (PDSTEBZ/PDSTEIN) to compute the positive eigenvalues and the corre¬ 
sponding eigenvectors. The eigenvectors are reorthogonalized when the accuracy requirement 
is high. 


3. Step 1^ in Algorithm is not performed by PBLAS. Because Q and D are both known, the 
application of these unitary transformations can be accomplished with 0{n^) operations. 

4. In Stepj^of Algorithm]^ we separate the computation for real and imaginary parts: Re(A+) = 
Re(A)Vl)., Im(A_|_) = Im(A)Vl)_. This is based on the fact that V+ is real, and one PZGEMM 
call is about twice as expensive as two PDGEMM calls. 


Finally, we remark that our parallel algorithms/implementations are just proof-of-concept. 
There is certainly room for improvement. For example, our tridiagonal reduction step is mod¬ 
ified from ScaLAPACK’s PDSYTRD, which is relatively simple, but is not state-of-the-art. Many 
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modern implementations of symmetric tridiagonal reduction are based on successive band reduc¬ 
tion [3 HOI US]- The successive band reduction technique extends naturally to skew-symmetric 
matrices. There are also many alternative tridiagonal eigensolvers other than the bisection method, 
for example, the MRRR method mM- But improvements in these directions exceed the scope 
of this paper. 


4 Numerical experiments 

In this section, we present numerical examples for three matrices obtained from discretized Bethe- 
Salpeter equations. The numerical experiments are performed on the Cray XE6 machine. Hopper, 
at the National Energy Research Scientific Computing Center (NERSC)j^Each Hopper node con¬ 
sists of two twelve-core AMD ‘MagnyCours’ 2.1-GHz processors, and has 32 GB DDRS 1333-MHz 
memory. Each core has its own LI and L2 caches, with 64 KB and 512 KB, respectively. Hopper’s 
compute nodes are connected via a 3D torus Cray Gemini network with a maximum bandwidth of 
8.3 gigabytes per second. The internode latency ranges from 1.27 microseconds to 1.38 microsec¬ 
onds. The latency between cores is less than 1 microsecond. 

The examples we use here correspond to discretized Bethe-Salpeter Hamiltonians associated 
with naphthalene, gallium arsenide (GaAs), and boron nitride (BN), respectively. The dimensions 
of these 2n x 2n Hamiltonians are 64 x 64, 256 x 256, and 4608 x 4608, respectively. We implemented 
Algorithm in Fortran 90, using the message passing interface (MPI) for parallelization. We used 
ScaLAPACK to perform some basic parallel matrix computations. No multithreading features such 
as OpenMP or threaded linear algebra libraries are used. To make fair comparisons, all eigenvalues 
and eigenvectors are calculated. 

We first compare our implementation of Algorithm]^ with LAPAGK’s non-Hermitian eigensolver 
ZGEEV. Test runs are performed using a single core so that both solvers are sequential. From Table [l] 
we see that both solvers produce solutions with small residuals, and Algorithm is in general more 
accurate. We also compute the eigenvalues of A using LAPACK’s Hermitian eigensolver ZHEEV. In 
Figure we plot the spectral density of the computed eigenvalues with and without TDA. The 
delta function in ([^ is approximated using the Gaussian function, that is. 



with standard deviation cr = 5 x 10“^. The figure illustrates the difference between A(A) and 
A{H) for the naphthalene example. We observe up to 12% relative differences in eigenvalues in 
this example when TDA is used. We also see from the figure that A (A) is always to the right 
of A(i7). For the other two examples, the error introduced by TDA are up to 1.2% and 0.93%, 
respectively. Computational results confirm that all eigenvalues obtained by TDA are larger than 
the true eigenvalues as predicted by Theorem We remark that ZGEEV produces quite accurate 
eigenvalues in these examples, despite the fact that all computed eigenvalues are not real. 

Table contains the execution time of different approaches. Algorithm is about five times 
faster than the non-Hermitian solver ZGEEV. An interesting observation is that TDA is not much 
faster than our full-BSE solver, especially when n gets large. This is not a surprising result. In 
fact, the major cost of Algorithm is diagonalizing a real 2n x 2n skew-symmetric matrix, which 
is comparable to the cost of diagonalizing a complex n x n Hermitian matrix. 

^https://www.nersc.gov/users/computational-systems/hopper/ 
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Table 1: Comparison between ZGEEV and Algorithm [T| 




naphthalene 
n = 32 

GaAs 
n = 128 

BN 

n = 2304 

ZGEEV 

\\Y'^HX-A\\f/\\H\\f 

\\Y^X-hn\\F/V^ 

3.8 X lO-i'^ 

2.8 X 10-15 

8.5 X 10-15 

9.0 X 10-15 

2.0 X 10-11 

2.1 X 10-1^ 

Algorithm 

\\Y^HX-A\\p/\\H\\f 

WY^X-hJp/V^ 

1.5 X 10-15 
1.1 X 10-15 

3.3 X 10-15 
3.1 X 10-15 

5.4 X 10-15 
4.3 X 10-15 


Spectral density 



Figure 1: Comparison of the spectral density, defined by Equation (§, for the full Bethe-Salpeter 
eigenvalue problem and the BSE constructed without the off-diagonal blocks, with the Tamm- 
Dancoff approximation. 


Table 2: Execution time of different approaches. 




naphthalene 

GaAs 

BN 



n = 32 

n = 128 

n = 2304 

ZGEEV 


1.8 X 10-^ 

4.8 X 10-1 

1.2 X 105 

Algorithm 

1 

4.1 X 10-5 

7.6 X 10-^ 

1.6 X 10^ 

ZHEEV (TDA.) 

1.6 X 10-5 

5.2 X 10-^ 

2.5 X 10^ 


Einally, we perform a simple scalability test on our parallel implementation of Algorithm We 
use the BN example since it is of moderate size. Test runs are performed on 1 x 1, 2 x 2, ..., 9 x 9, 
10 X 10 processor grids, with block factor ni, = 64 for the 2D block cyclic data layout. In Eigurej^ 
we illustrate the overall execution time, as well as the performance profile for each component of 
the algorithm. We can see from the figure that all components scale similarly in this example. This 
indicates that Algorithm scales reasonably well as the overall parallel scalability is close to that 
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Parallel scalability 



Figure 2: Parallel scalability of Algorithmfor BN {H S 


of the PDGEMM component. 

5 Conclusions 

We showed, in this paper, that the Bethe-Salpeter eigenvalue problem is equivalent to a real 
Hamiltonian eigenvalue problem and can always be solved by an efficient Hamiltonian eigensolver. 
For complex Bethe-Salpeter Hamiltonians that satisfy the additional property ®) , which almost 
always holds in practice, it is possible to compute all of its eigenpairs by a structure preserving 
algorithm we developed in this paper. When the Hamiltonian is real, we can turn the eigenvalue 
problem into a product eigenvalue problem. We presented an efficient and reliable way to solve this 
eigenvalue problem. 

When the Tamm-Dancoff approximation is used, the Bethe-Salpeter eigenvalue problem reduces 
to a Hermitian eigenvalue problem that can be solved by existing tools in the ScaLAPACK library. 
We showed that Tamm-Dancoff approximation always overestimate all eigenvalues. 

We presented numerical algorithms that demonstrated the accuracy and efficiency of our algo¬ 
rithm. However, we should point out that our parallel implementation is preliminary, and there is 
plenty of room for improvements. 
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